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Abstract 

In real applications of small area estimation, one often encounters data with 
positive response values. The use of a parametric transformation for positive 
response values in the Fay-Herriot model is proposed for such a case. An asymp¬ 
totically unbiased small area predictor is derived and a second-order unbiased 
estimator of the mean squared error is established using the parametric boot¬ 
strap. Through simulation studies, a finite sample performance of the proposed 
predictor and the MSE estimator is investigated. The methodology is also suc¬ 
cessfully applied to Japanese survey data. 

Keywords: Dual power transformation; empirical Bayes estimation; 
Fay-Herriot model; mean squared error; positive-valued data; small area 
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1. Introduction 

Sample surveys are indispensable to estimate various characteristics of a 
population of interest. However, reliability of estimates from sample surveys 
depends on sample sizes, and direct estimates from small sample surveys have 
large variability, which is known as a small area estimation problem. In small 
area estimation methodology, a model-based approach has become very popular 
to produce indirect and improved estimates by ‘borrowing strength’ from related 
areas. Importance and usefulness of the model-based small area estimation 
approach has been emphasized in the literature. For a recent comprehensive 
review of small area estimation, see Pfeffermann (2013) and Rao and Molina 
(2015). 
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To describe the detailed setting, we define yt as the direct survey estimator 
of the area mean 9i, noting yi is often unstable because of small area sample 
sizes. For producing a reliable estimate of 6i, the most famous and basic small 
area model is the Fay-Herriot (FH) (Fay and Herriot, 1979) described as 

yi = xl(3+ Vi + Si, i = ( 1 . 1 ) 

where Vi ~ iV(0, A) and Si ~ N{0,Di) for known Hi’s, and the quantity of 
interest is 9i = xlj3 + Vi. It is well known that the best predictor 9i that 
minimizes the mean squared error is expressed as 

= nyt + (1 - 


which is a weighted combination of the direct estimator yi and the synthetic 
part x\(3 with weight 7 * = Aji^A + Di). The weight is a decreasing function of 
Di so that the weight on synthetic part x\l3 is large when yi is not reliable, that 
is, the sampling variance Di is large. Since it depends on unknown parameters 
(3 and A, the practical form of 9i is obtained by plugging estimators /3 and A 
into 9i, namely 


Oi = %yi + (1 - 7*)a^‘3 


Ayi + Djxl'^ 

A + Di 


which is called the empirical best linear predictor (EBLUP). 

Until now, the EBLUP and the related topics have been extensively studied 
in the framework of the Fay-Herriot model. Chatterjee et al. (2008) and Diao et 
al. (2014) proposed the empirical Bayes confidence intervals of 9i with second- 
order refinement. Li and Lahiri (2010) and Yoshimori and Lahiri (2014) were 
concerned with the problem of estimating the variance parameter A avoiding 0 
estimate. Moreover, Ghosh et al. (2008) and Sinha and Rao (2009) suggested 
some robust estimating methods for the Fay-Herriot model. The Fay-Herriot 
model and EBLUP are simple and useful methods, but the setting of the Fay- 
Herriot model is sometimes inadequate for analysis of real data. Therefore, 
several extensions of the Fay-Herriot model have been proposed. Opsomer et 
al. (2008) suggested a nonparametric small area model using penalized spline 
regression. In relation to the assumption of known U^’s, Gonzalez-Manteiga 
et al. (2010) proposed a nonparametric procedure for estimating Di, and You 
and Chapman (2006) and Haiti et al. (2014) considered shrinkage estimation 
of Di by assuming a hierarchical structure for Di. Ybarra and Lohr (2008) 
and Arima et al. (2014) were concerned with the problem of measurement 
error in covariate Xi. Datta et al. (2011) and Molina et al. (2015) suggested 
procedures of preliminary testing for existence of the random effect Vi. Datta 
and Mandal (2015) proposed a mixture of two models; one includes a random 
effect and the other does not include a random effect. Although all these papers 
treat important problems, the response values of the data are assumed to be 
normally distributed. 

However, we often encounter positive-valued data (e.g. income, expense), 
which have skewed distributions and non-linear relationships with covariates. 
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For such a data set, the traditional Fay-Herriot model with a linear structure 
between response values and covariates and a normally distributed error term is 
not appropriate. A typical alternative approach is using the log-transformed re¬ 
sponse values as discussed in Slud and Maiti (2006), but the log-transformation 
is not always appropriate and it may produce inefficient and biased prediction 
when the log-transformation is misspecified. Thus, a natural way to solve this 
problem is using a parametric family of transformations which enables us to se¬ 
lect a reasonable transformation based on data. A famous family is the Box-Cox 
transformation (Box and Cox, 1964) defined as 



However, it suffers from a truncation problem that the range of the Box-Cox 
transformation is not the whole real line if A ^ 0, which leads to inconsistency of 
the maximum likelihood estimator of A. Moreover, the inverse transformation 
cannot be defined on whole real line, so that we cannot define a back-transformed 
predictor in the original scale. Alternatively, Yang (2006) suggested a novel 
family of transformations called the dual power transformation (DPT): 



which can be seen as the average of two Box-Cox transformations, namely 
h\{x) = {hx^{x) + h^^{x)}/2. The main advantage of the DPT is that its 
range is the whole real line for all A > 0 so that DPT does not suffer from the 
truncation problem. Sugasawa and Kubokawa (2015) proposed the Fay-Herriot 
model in which response variables are transformed by general parametric trans¬ 
formations. In this paper, we focus on the FH model with DPT transformation 
described as 


h\{yi) = xll3 + Vi + Ei, i = l,...,m. 


( 1 . 2 ) 


where Vi ^ N{0,A) and Ei ^ N{0,Di) for known D^’s. 

Although Sugasawa and Kubokawa (2015) derived EBLUP of 9i = xjfS + Vi 
and the MSE estimator, the parameter of most interest in the model (O is 
IJLi = h'^^{0i) rather than 0i, where is the inverse transformation of DPT: 



Thus, the method developed in Sugasawa and Kubokawa (2015) is not enough 
for practical applications. In this paper, we focus on the prediction of yn with 
its risk evaluation. Specifically, we derive the best predictor of jii as the condi¬ 
tional expectation and the empirical best predictor by plugging the parameter 
estimates in the best predictor. For risk evaluation, we construct a second order 
unbiased MSE estimator based on the parametric bootstrap. 

The paper is organized as follows. In Section[2l we derive the best predictors 
of yii as well as the maximum likelihood estimation of model parameters. A 
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second-order unbiased estimator of the mean squared error of the small area 
predictor is derived based on the parametric bootstrap. In Sections [3] and IH 
we show some simulation studies and empirical applications, respectively. In 
Section [SI we give some concluding remarks. The technical details are given in 
the Appendix. 

2. Transformed Fay-Herriot Model 

2.1. Model setup and best predictor 

We consider the following parametric transformed Fay-Herriot (PTFH) model 
for area-level data: 


hx{yi) = xl(3 + Vi + Si, i = l,...,m 


( 2 . 1 ) 


where Vi ^ iV(0, A), Si ^ N{0,Di) for known Di’s, (3 and Xi are p-dimensional 
vectors of regression coefficients and covariates, respectively. The unknown 
models parameters are denoted by 0 = (/3*,A, A)‘ and we aim to estimate 
(predict) fj,i = with 9i = a;‘/3 + Vi. Note that when A = 0, the model 

reduces to the log-transformed Fay-Herriot model studied by Slud and 
Maiti (2006). 

It is well known that the best predictor of 9i under the squared error loss is 
given by _ 

9^ = 'jihxiyi) + (1 - (2.2) 

where 7 i = A/{A-\-Di). Hence, one possible way to predict yti is using the simple 

back-transformed predictor = h^^{9i). However, is not suitable for 
'^(S) 

predicting /i^, because /r) ' has a non-ignorable bias for predicting /r^, namely 
— Hi] ^0 even when m is large. On the other hand, Slud and Maiti 
(2006) considered the bias corrected predictor of F{9i) for a general function 
F(-), which leads to the following form: 




(SM) 


nh-\9^)] 

E[hx\0.)] 


hxm 


~is) 


It clearly holds that — Hi] = 0, that is, h^^^^ is an unbiased predictor 

of Hi while it does not necessarily minimize the squared error loss. We here 
use the conditional expectation Hi = Fi[h^^{9i)\yi] with known ^ as a predictor 
of /ii, which minimizes the squared error loss. Since 9i\yi ^ Nljli^uf) with 9i 
given in (12.21) and cr^ = ADi/{A + Di) under the model (12.11) . the conditional 
expectation Hi can be expressed as 

poo _ 

Hi = Hi{yH cf}) = h^\t)(l){t-9i,af)dt, (2.3) 


where </)(•; a, h) denotes the density function of N{a, h). It should be noted that 
Jii = when A = 0, namely h^^{x) = exp(a:). However, Jli and are 

not necessarily identical when A > 0. 
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Since the model parameters (p is unknown in practice, we estimate them by 
maximizing the marginal likelihood function, and the details are given in the 
next section. Let (p be the corresponding estimator of <p. Then, replacing cp 
with (p in (j2.3l) leads to the empirical form of pi: 



which is known as empirical best predictor (EBP). Note that pi is no longer 
the conditional expectation but pi converges to pi as m ^ oo under some 
regularity conditions. Since pi cannot be obtained in an analytical form, we 
rely on numerical techniques for computing pi. A typical method is the Monte 
Carlo integration by generating a large numbers of random samples from {9i,af). 
However, we here use Gaussian-Hermite quadrature which is known to be more 
accurate than the Monte Carlo integration. 

2.2. Estimation of model parameters 

Under normality assumption of Vi and £;, it follows that hx {yi) ~ N{x\(3, A+ 
Di) and h\{yi), i = 1,... ,m are mutually independent. Then, the maximum 
likelihood estimator cp ot (p is defined as the maximizer of L((p), where 



(2.4) 


Note that the third term in (12.41) comes from the Jacobian of the transformation. 
When A is given, maximizing (12.41) with respect to /3 and A coincides to maxi¬ 
mizing the log-likelihood function of the classical Fay-Herriot model. Hence, the 
value of profile likelihood function of A is easily computed, so that we may esti¬ 
mate A by grid search over a specified region or golden section method (Brent, 
1973). Though the parameter space of A is [0,oo), it would be sufficient to 
consider the space [0,Am] for moderately large Am- 

For asymptotic properties of the estimator cp, we assume the following con¬ 
ditions. 

Assumption 2.1. 

1. There exist ID. and D independent to m such that D_ < Di < D for i = 

These conditions are usually assumed in the context of small area estima¬ 
tion, see Datta and Lahiri (2000) and Butar and Lahiri (2003). Under these 
conditions, we have the following lemma. 

Lemma 2.1. Under Assumvtion \2.1l as m ^ oo, y/rn{cp — cp) asymptotically 
follows the multivariate normal distribution A(0, V(0)) with a covariance ma¬ 
trix V{cp), and it holds E[(/) — cp] = m~^b{<p) -|- o{m~^) with a smooth function 


b{cp). 
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The asymptotic normality of 4> immediately follows from Sugasawa and 
Kubokawa (2015). Moreover, from the proof of Theorem 1 in Lohr and Rao 
(2009), the bias 6(</>) can be expressed by partial derivatives of T(</>) given in 
(1^ . so that the latter statement in Lemma o follows. 

Other estimators of A are the restricted maximum likelihood estimator 
(Jiang, 1996), the Prasad-Rao estimator (Prasad and Rao, 1990), the Fay- 
Herriot estimator (Fay and Herriot, 1979) and the adjusted maximum likelihood 
estimator (Li and Lahiri, 2010). These methods can be easily implemented and 
their asymptotic properties are discussed in Sugasawa and Kubokawa (2015). 
However, for simplicity, we do not treat these estimators in this paper. 

2.3. Mean squared error of the empirical best predictor 

In small area estimation, mean squared errors (MSEs) of small area estima¬ 
tors are used for risk evaluation, and their importance has been addressed in 
many papers including Lahiri and Rao (1995) and Datta et al. (2005). Follow¬ 
ing this convention, we evaluate the MSE of the empirical best predictor pi. To 
begin with, we note that the MSE can be decomposed as 


MSE, = E[(/x, - /i,)2] = E[(/I, - /r,)2] + E[{pi - p.f] 
= 9u{(f) +92i{(f), 


because pi = E[pi\yi] is the conditional expectation. In what follows, we use 
the explicit notation pi{yi, (p) instead of fii if necessary. The first term gii{4>) 
is expressed as 


5i,(</>)=E {pi{x\l3 + Vi+£u(t))-h^^{x\l3 + Vi)Y , 


which has no analytical expression. The direct Monte Carlo integration by 
generating random samples of Vi and £i requires a large computational burden 
because we need another Monte Carlo integration for computing pi for each 
sample (vi,£i). However, as shown in the Appendix, it turns out to have the 
following more simple expression of gu^cp): 


gii{(p) =E E zi)Y - {x \(3 + ciiZi + C2iZ2)hy^^{x\^ + cuZi - 02 ^ 22 )] , 

_ _ (2.5) 


where zi,Z 2 ^ A^(0,A), Cu = \/{I + ai)/2, and C 2 i = v^(l - a,)/2 for Ui = 
A/[A-\-Di). Hence, gii{p) can be easily calculated by generating a large number 
of random samples of zi and Z 2 - On the other hand, the second term g 2 i{P) can 
be evaluated as the following lemma, where the proof is given in the Appendix. 

Lemma 2.2. Under Assumvtion \2. R it holds 



I -I- o{m ^). 


dpi dpi 
dcp dcp* 


Since the MSE depends on unknown parameter cp, we need to estimate it 
for practical use. To this end, we obtain a second-order unbiased estimator 
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of the MSE. Here, an estimator B is called second order unbiased if E[i?] = 
B + o{m~^). From lemma [2^ it follows that g2i{4>) = + o(to“^) 

with the smooth function ci((/)), thereby the plug-in estimator g 2 ii<P) is second- 
order unbiased. However, the plug-in estimator gii{4>) has a second-order bias 
since gii{cf>) = 0(1), so that we need to correct the bias. Hence, we propose 
the parametric bootstrap method to correct the bias of gii{cf>) and computing 
The procedure is given in the following. 


Parametric bootstrap method for the MSE estimation 

1. Generate bootstrap samples y* from the estimated model; 

%(yi) = + <+£?, i=l,...,TO, (2.6) 

where e* and v* are generated from N{0,Di) and iV(0,H), respectively 

2. Based on {y*, Xi), z = 1,..., m, compute the maximum likelihood estimate 
4> and the predicted values of gi = gi{yi, (f>) and y* = y,i{yi,cf) ). 


3. Derive the bootstrap estimates of gu and g 2 i via 

5 ?r($) = 2ffi.($) - E* \giS*)] , fe(0) = E* Uy* - %) 


where y* = + v* ) and E* [•] denotes the expectation with respect 

to the bootstrap samples generated from (EH). The second-order unbiased 
estimator of the MSE based on the parametric bootstrap is given by 

M^=5h($)+52z($)- (2.7) 


The resulting MSE estimator EH is second-order unbiased as shown in the 
following theorem, which is proved in the Appendix. 

Theorem 2.1. Let MSE^ be the parametric bootstrap MSE estimator given in 
EH- Then, under Assumvtion \2. we have 


E 


MSE, 


MSEi -I- o(m ^), 


where the expectation is taken with respect to yt’s following the model O- 


In (12.7L the bias correction of gii{4>) is carried out via using the addi¬ 
tive form gif{4>) = 2gii{cf>) — E*[gii{(f) )], where E* denotes the expectation 
with respect to bootstrap samples. Hall and Maiti (2006) suggested other bias- 
correcting methods including a multiplicative bias correcting method of the 
form gii{cj})'^/E*[gii(cj) )]. The multiplicative form for bias correction can avoid 
negative estimates of the MSE while the additive form for bias correction gives 
negative estimates of the MSE with a positive probability. Although those bias 
corrections give second-order unbiased estimates of gii{4>), in this paper, we 
use the additive-type bias correction, because it has been frequently used in the 
literatures (e.g. Butar and Lahiri, 2003). 
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3. Simulation Studies 

3.1. Evaluation of prediction errors 

We evaluated prediction errors of the proposed PTFH model and some ex¬ 
isting models. As a data generating process, we considered the following PTFH 
model: 

= Pq + j3iXi + Vi + £i, i = l,...,30, (3.1) 

where Vi ^ N{0,A), Si ^ N{0,Di) with /3 q = l,/3i = 1 and A = 1.5. For A, we 
treated the four cases A = 0.1,0.4,0.7 and 1.0. The covariates Xi were initially 
generated from the uniform distribution on (0,4) and fixed in simulation runs. 
Concerning sampling variance Di, we divided 30 areas into 5 groups (from Gi to 
Gs), and areas within the same group have the same Di value. The Gi-pattern 
we considered was (0.2,0.4,0.6,0.8,1.0). The true small area parameters are 

Mi = + PlXi + Vi). 

For comparison, we considered the log-transformed FH (log-FH) model and 
the traditional Fay-Herriot (FH) model, which are described as 

log-FH: log yi = Po + j3ix, + Vi + £i 
FH: yi = Pq + PiXi + Vi + Ei. 

It is noted that the data generating process en) get close to log-FH as A gets 
smaller. 

Since we do not known the true Di in practice, we computed the estimates of 
y.i with estimated Di as investigated in Bell (2008). To this end, we generated 
the auxiliary observation Zik from the model: 

h\{z^k) = £ik, i = l,...,30, fc=l,...,10, (3.2) 

where Eij ^ N{0,Di). In applying log-FH and FH, we computed the estimates 
of Di as the sampling variances of {logZii,... ,logZiio} and {zn,..., zno}, re¬ 
spectively. Then we computed the estimates of Mi using EBLUP in FH and 
the bias-corrected estimator used in Slud and Maiti (2006) in log-FH, where 
the model parameters /3oj/3i and A are estimated via the maximum likelihood 
method. For fitting PTFH, we first define 

Di = D^{X) =-^^hx{zik) - hx{z)^''^ , hxiz)^ = —^hx{z,k), 

^ k=l ^ k=l 

SO that we regard Di as a function of A and replace Di with Di{X) in (12.411 . 
Since Di{X) can be immediately computed under the given A, we can maximize 
the profile likelihood function of A in the similar manner to that presented in 
Section o Once the estimate A is computed, Di can be calculated as Di{X). 

Based on R = 10000 simulation runs, we computed the coefficient of varia¬ 
tion (CV) and the absolute relative bias (ARB), defined as 


CV, = 






r—1 




(r)2 


and ARBi = 


Mi ~ Mi 

^ 1 
r=l 
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where is the true value and is the estimated value from PTFH, log-FH 
or FH, in the rth iteration. Table [T] shows the percent CV and ARB averaged 
within the same groups for each case of A. For comparison, we also show the 
results for PTFH with true Di values, denoted by PTFH-t in Table [TJ 

From Table [1] we can observe that difference CV or ARB between PTFH-t 
and PTFH tends to be large when A is small and Di is large while tow methods 
perform similarly when A is large or Di is small. Moreover, it is revealed that 
the larger Di would lead to larger CV and ARB values in all the methods. 
Concerning comparison among PTFH, log-FH and FH, it can be seen that 
PTFH performs better than FH except for A = 0.9, and PTFH performs better 
than log-FH except for A = 0.1. Moreover, the differences between PTFH and 
log-FH in A = 0.1 get larger as Di gets larger. Regarding PTFH-t, it performs 
best in most cases. However, it is observed that log-FH and FH produce more 
accurate estimates than PTFH-t in some cases. 

We next investigated the prediction errors when the true distribution of Vi 
is not normal. Here, we considered a t-distribution with 5 degrees of freedom 
for Vi, where the variance is scaled to A, and the other settings for the data 
generation are the same as ill. Under the scenario, we again computed the 
values of CV and ARB of the four methods based on 10000 simulation runs, 
and the results are reported in Tabled It is observed that the simulated RMSE 
values in Table [5] are larger than those in Table [I] due to misspecification of the 
distribution of Vi. However, relationships of CV and ARB among three methods 
are similar to Table [T] 

3.2. Finite sample performance of the MSE estimator 

We next investigated a finite sample performance of the MSE estimator 
(lOl . Following Datta, Rao and Smith (2005), we considered the following data 
generating process without covariates: 


hxivi) = + Vi + 1 = 1,..., 30, 

with fj, = 0, Vi V(0,A) with A = 1 and ^ N(0,Di). As a value of A, 
we considered the three cases A = 0.2, 0.6,1.0. For setting of Di, we divided 
Dfs into five groups Gi,..., Gs, where Dfs were the same values over the same 
group, and the following three patterns of Dfs were considered: 

(a) 0.3,0.4,0.5,0.6,0.7, (&) 0.2,0.4,0.5,0.6,2.0, (c) 0.1,0.4,0.5,0.6,4.0. 

Based on Ri = 5000 simulation runs, we calculated the simulated values of the 
MSE as 


^ r—1 


,(D\2 


(r) 

N 


= hf\po+vn 


■^(r) (r) 

where /i) ' and u) ' are the predicted value and the realized value of Vi in the 
r-th iteration. Then based on i ?2 = 2000 simulation runs, we calculated the 


9 


Table 1: Simulated percent coefficient of variation (CV) and absolute relative biases (ARB) 
of the parametric transformed Fay-Herriot with use of true Di (PTFH-t) and estimated Di 
(PTFH), the log-transformed Fay-Herriot (log-FH) model, and the Fay-Herriot (FH) model 
under A = 0.1, 0.4, 0.7 and 1.0. 





CV 



ARB 



method 

0.1 

0.4 

0.7 

1.0 

0.1 

0.4 

0.7 

1.0 

Gi 

PTFH-t 

46.49 

33.28 

23.49 

17.95 

13.67 

6.89 

3.69 

2.43 


PTFH 

47.46 

32.90 

23.12 

17.53 

13.97 

6.89 

3.51 

2.21 


log-FH 

47.11 

35.34 

28.63 

23.57 

13.16 

4.18 

3.85 

2.39 


FH 

50.24 

32.94 

22.84 

16.87 

9.74 

3.82 

1.78 

1.13 

G2 

PTFH-t 

63.92 

47.64 

37.09 

31.61 

20.20 

11.85 

8.13 

6.69 


PTFH 

66.32 

47.84 

36.92 

31.38 

21.28 

12.24 

8.31 

6.74 


log-FH 

66.79 

53.25 

46.70 

41.76 

22.25 

15.14 

13.96 

13.86 


FH 

82.25 

52.67 

38.61 

31.11 

19.95 

9.16 

6.27 

5.98 

Ga 

PTFH-t 

75.92 

59.42 

48.51 

40.60 

25.67 

16.79 

12.85 

9.59 


PTFH 

79.86 

60.35 

48.22 

40.35 

27.67 

17.74 

13.15 

9.67 


log-FH 

77.70 

60.99 

53.25 

47.15 

26.56 

15.97 

13.04 

11.30 


FH 

116.09 

74.04 

53.70 

40.88 

30.55 

17.10 

11.77 

9.25 

Gi 

PTFH-t 

86.92 

67.82 

53.66 

44.73 

32.29 

20.84 

14.11 

10.82 


PTFH 

92.97 

69.13 

53.38 

43.83 

35.12 

22.13 

14.43 

10.63 


log-FH 

86.81 

65.36 

53.46 

46.36 

30.88 

14.42 

8.29 

8.94 


FH 

156.12 

91.68 

61.04 

45.61 

45.29 

25.10 

15.15 

11.25 

Gs 

PTFH-t 

92.90 

72.74 

59.86 

49.86 

33.87 

22.99 

17.30 

13.04 


PTFH 

101.81 

75.26 

60.39 

49.54 

37.62 

24.73 

18.07 

13.24 


log-FH 

95.91 

71.73 

61.69 

53.58 

34.91 

20.57 

15.30 

12.87 


FH 

198.30 

112.23 

73.57 

53.02 

57.58 

31.04 

19.05 

13.93 


relative bias (RB) and the coefficient of variation (CV) defined as 

1 -^2 /_ ( r ) \ 

— Y , ( mse, - mseA /mse, , 

^ r—1 ^ ^ 

1 ^2 / _^ 2 

— y MSE, - MSE, /MSE^. 

R2 ^ V J 

For calculation of the MSE estimates in each iteration, we used 100 bootstrap 
replication for the MSE estimator and 10000 Monte Carlo samples for computing 
gii- We also investigated the performance of the MSE estimator when we used 
the estimated sampling variances instead of known Di. To this end, similarly 
to the previous section, we generated the auxiliary observation from (03), and 
calculate DiS using these data in each simulation run. Based on the same 
number of simulation runs, we computed the values of RB and CV. Table [3] 
and Table |4] show the maximum, mean and minimum values of RB and CV 


RB, = 

cvy 
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Table 2: Simulated percentage coefficient of variation (CV) and percentage absolute relative 
biases (ARB) of the parametric transformed Fay-Herriot with use of true Di (PTFH-t) and 
estimated Di (PTFH), the log-transformed Fay-Herriot (log-FH) model, and the Fay-Herriot 
(FH) model under A = 0.1, 0.4, 0.7 and 1.0, when the distribution of Vi is a t-distribution with 
5 degrees of freedom. 


CV ARB 



method 

0.1 

0.4 

0.7 

1.0 

0.1 

0.4 

0.7 

1.0 

Gi 

PTFH-t 

47.19 

39.58 

34.16 

28.94 

14.10 

9.77 

7.31 

5.43 


PTFH 

48.42 

39.84 

34.40 

28.34 

14.70 

10.07 

7.38 

5.18 


log-FH 

48.09 

41.08 

39.98 

35.31 

14.00 

8.14 

4.95 

5.65 


FH 

51.54 

41.09 

33.00 

27.84 

10.26 

6.92 

5.18 

4.28 

G2 

PTFH-t 

66.06 

53.12 

41.12 

35.83 

21.41 

14.08 

8.99 

6.92 


PTFH 

68.38 

53.52 

40.86 

35.18 

22.59 

14.78 

9.10 

6.80 


log-FH 

67.79 

56.17 

46.04 

43.20 

21.84 

13.32 

7.32 

6.73 


FH 

83.45 

58.51 

41.73 

34.32 

21.03 

12.63 

7.67 

6.18 

Ga 

PTFH-t 

79.92 

62.21 

49.91 

42.61 

25.50 

16.81 

10.91 

8.69 


PTFH 

83.75 

62.26 

47.41 

41.62 

27.39 

17.60 

11.05 

8.66 


log-FH 

83.53 

65.45 

55.47 

52.08 

27.40 

18.20 

12.93 

13.22 


FH 

121.91 

73.92 

50.11 

41.05 

31.49 

17.11 

10.35 

8.25 

G4 

PTFH-t 

89.82 

78.86 

61.81 

53.21 

30.06 

21.02 

15.63 

12.03 


PTFH 

98.67 

75.36 

59.27 

51.18 

32.81 

22.26 

15.83 

12.01 


log-FH 

99.13 

81.96 

62.46 

57.54 

31.16 

19.98 

13.48 

11.01 


FH 

155.72 

108.51 

65.22 

52.10 

44.14 

25.07 

15.96 

12.80 

Gs 

PTFH-t 

104.87 

86.53 

77.05 

71.23 

34.72 

26.73 

22.00 

17.04 


PTFH 

123.97 

86.69 

74.22 

64.63 

38.87 

28.84 

22.83 

17.06 


log-FH 

120.17 

87.54 

75.41 

67.96 

34.96 

22.70 

15.61 

11.36 


FH 

224.09 

128.60 

87.85 

69.25 

61.85 

40.85 

29.11 

21.47 


within the same group. In both tables, the simulated values of RB and CV of 
the MSE estimator with estimated Di are given in the parenthesis. It is seen 
that the proposed MSE estimator with known Di provides reasonable estimated 
values in almost all cases in terms of both RB and CV. On the other hand, the 
MSE estimator with estimated Di performs worse than the MSE estimator with 
known Di since the former estimator is affected by the variability of estimating 
Di. Moreover, it is observed that performances of both MSE estimators get 
better in the order of Pattern (a), (b) and (c). 

4. Application to Survey Data in Japan 

We consider an application of the proposed method together with some ex¬ 
isting methods to the data from the Survey of Family Income and Expenditure 
(SFIE) in Japan. Especially, we used the data on the spending item ‘Health’ 
and ‘Education’ in the survey in 2014. For the spending item ‘Health’ and 
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Table 3: The percentage RB values of the MSE estimator with known Di and unknown Di 
(Parenthesis) in each group. 




Pattern (a) 


Pattern (b) 


Pattern (c) 

A 

0.2 

0.6 

1.0 

0.2 

0.6 

1.0 

0.2 

0.6 

1.0 

max Gi 

6.3 

17.9 

19.5 

3.9 

7.3 

9.0 

7.2 

1.3 

2.7 


(28.1) 

(44.3) 

(51.4) 

(98.4) 

(80.8) 

(82.6) 

(113.5) 

(47.1) 

(44.1) 

G2 

4.9 

15.1 

17.2 

4.4 

4.4 

5.3 

4.1 

-3.6 

-4.9 


(44.6) 

(69.2) 

(78.8) 

(46.4) 

(107.2) 

(99.8) 

(37.3) 

(29.8) 

(20.8) 

Gs 

12.5 

10.4 

14.6 

-2.8 

3.3 

4.6 

-3.4 

-4.1 

-5.0 


(32.9) 

(40.8) 

(47.0) 

(24.2) 

(12.5) 

(10.6) 

(11.3) 

(37.5) 

(29.7) 

G4 

6.4 

12.6 

17.6 

-2.6 

6.7 

8.9 

-5.0 

-4.5 

-4.8 


(33.8) 

(28.3) 

(40.2) 

(12.1) 

(8.8) 

(8.0) 

(16.1) 

(84.4) 

(75.7) 

Gs 

8.7 

12.4 

16.3 

-1.5 

2.8 

6.1 

-5.3 

-4.2 

-1.6 


(15.9) 

(39.1) 

(52.1) 

(6.2) 

(43.9) 

(43.6) 

(4.6) 

(26.0) 

(19.5) 

mean Gi 

2.5 

7.4 

8.6 

-1.9 

1.4 

2.6 

-2.7 

-3.3 

-3.1 


(22.4) 

(31.4) 

(36.9) 

(33.5) 

(32.3) 

(33.9) 

(47.6) 

(27.7) 

(27.0) 

G2 

-1.0 

7.8 

9.8 

-3.4 

-2.0 

-2.0 

-6.0 

-5.9 

-7.3 


(21.9) 

(33.4) 

(39.9) 

(15.5) 

(21.5) 

(18.8) 

(20.2) 

(4.8) 

(-0.6) 

Gs 

5.2 

1.9 

5.2 

-6.3 

-1.7 

-0.4 

-6.1 

-8.4 

-9.1 


(17.7) 

(21.1) 

(28.7) 

(11.2) 

(3.6) 

(2.4) 

(-0.1) 

(10.4) 

(4.6) 

G4 

3.3 

6.4 

9.9 

-8.0 

-0.4 

1.5 

-7.4 

-7.5 

-7.9 


(20.4) 

(17.7) 

(26.9) 

(2.2) 

(-2.8) 

(-2.2) 

(-5.3) 

(15.6) 

(10.9) 

Gs 

1.4 

5.7 

9.9 

-6.0 

-0.2 

2.4 

-7.0 

-6.8 

-6.4 


(11.1) 

(22.1) 

(32.8) 

(-3.0) 

(6.4) 

(7.2) 

(-6.8) 

(-1.2) 

(-4.4) 

min Gi 

-4.9 

-3.2 

-2.4 

-6.9 

-5.2 

-4.5 

-8.5 

-6.9 

-6.6 


(17.2) 

(15.0) 

(20.2) 

(13.7) 

(16.1) 

(17.3) 

(21.5) 

(17.1) 

(17.0) 

G2 

-5.8 

-0.9 

2.8 

-7.5 

-7.0 

-7.5 

-9.6 

-7.8 

-9.9 


(11.5) 

(14.2) 

(18.6) 

(2.7) 

(-2.7) 

(-4.7) 

(8.1) 

(-5.2) 

(-8.8) 

Gs 

0.9 

-8.2 

-6.7 

-8.7 

-6.6 

-6.5 

-10.3 

-10.4 

-11.3 


(7.0) 

(13.2) 

(19.4) 

(-5.0) 

(-4.4) 

(-4.7) 

(-7.3) 

(-9.4) 

(-13.2) 

G4 

-1.9 

-3.0 

-0.2 

-11.9 

-5.8 

-3.9 

-9.1 

-11.2 

-11.9 


(10.0) 

(6.9) 

(14.1) 

(-7.0) 

(-11.6) 

(-9.5) 

(-12.6) 

(-16.1) 

(-18.2) 

Gs 

-3.4 

-6.4 

-2.4 

-7.9 

-5.4 

-2.8 

-8.5 

-9.3 

-9.8 


(6.4) 

(-0.6) 

(8.8) 

(-7.3) 

(-10.4) 

(-9.7) 

(-16.1) 

(-18.4) 

(-20.0) 


‘Education’, the annual average spending data at each capital city of 47 prefec¬ 
tures are available. The estimates are both unreliable since the sample sizes are 
around 50 for most prefectures. As a covariate, we used data from the National 
Survey of Family Income and Expenditure (NSFIE) for 47 prefectures. Since 
NSFIE is based on much larger sample than SFIE, the reported values are more 
reliable, but this survey has been implemented every hve years. Although the 
joint bivariate modeling of the two items ‘Health’ and ‘Education’ would be 
preferable as proposed in Benavent et al. (2016), we here consider applying 
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Table 4: The CV values of the MSE estimator with known Di and unknown Di (Parenthesis) 
in each group. 




Pattern 

(a) 

Pattern 

(b) 

Pattern 

(c) 

A 

0.2 

0.6 

1.0 

0.2 

0.6 

1.0 

0.2 

0.6 

1.0 

max Gi 

1.15 

1.03 

1.14 

0.78 

0.60 

0.66 

0.77 

0.62 

0.67 


(2.72) 

(1.65) 

(1.96) 

(2.13) 

(1.78) 

(1.89) 

(2.01) 

(1.19) 

(1.29) 

G2 

1.17 

1.33 

1.50 

0.55 

0.73 

0.71 

0.54 

0.50 

0.54 


(2.25) 

(2.50) 

(3.42) 

(1.20) 

(2.44) 

(2.49) 

(0.91) 

(0.84) 

(0.82) 

Ga 

2.25 

0.98 

1.16 

0.59 

0.87 

1.00 

0.52 

0.46 

0.53 


(1.34) 

(1.80) 

(2.16) 

(0.88) 

(1.02) 

(1.1) 

(0.68) 

(0.98) 

(0.97) 

G4 

1.02 

1.26 

1.54 

0.46 

0.84 

0.98 

0.51 

0.50 

0.57 


(1.44) 

(1.40) 

(2.08) 

(0.70) 

(0.94) 

(1.06) 

(0.66) 

(1.72) 

(1.70) 

Gs 

0.72 

1.01 

1.23 

0.67 

0.73 

0.88 

0.47 

0.66 

0.77 


(1.10) 

(1.59) 

(2.21) 

(0.91) 

(1.60) 

(1.87) 

(0.56) 

(0.90) 

(0.91) 

mean Gi 

0.99 

0.74 

0.82 

0.51 

0.50 

0.55 

0.45 

0.41 

0.43 


(1.05) 

(1.33) 

(1.57) 

(1.07) 

(1.07) 

(1.15) 

(1.10) 

(0.86) 

(0.90) 

G2 

0.78 

0.91 

1.06 

0.46 

0.56 

0.63 

0.40 

0.42 

0.47 


(1.23) 

(1.46) 

(1.83) 

(0.80) 

(1.05) 

(1.11) 

(0.73) 

(0.66) 

(0.68) 

Ga 

1.02 

0.86 

1.03 

0.46 

0.61 

0.71 

0.42 

0.41 

0.45 


(1.08) 

(1.27) 

(1.60) 

(0.76) 

(0.81) 

(0.88) 

(0.60) 

(0.70) 

(0.72) 

G4 

0.75 

0.93 

1.13 

0.43 

0.66 

0.78 

0.41 

0.44 

0.50 


(1.10) 

(1.24) 

(1.65) 

(0.64) 

(0.80) 

(0.89) 

(0.56) 

(0.85) 

(0.87) 

Gs 

0.71 

0.92 

1.12 

0.52 

0.66 

0.77 

0.41 

0.49 

0.56 


(0.99) 

(1.36) 

(1.83) 

(0.70) 

(0.94) 

(1.05) 

(0.53) 

(0.69) 

(0.72) 

min Gi 

0.61 

0.58 

0.65 

0.37 

0.41 

0.45 

0.35 

0.35 

0.36 


(0.95) 

(0.93) 

(1.12) 

(0.69) 

(0.86) 

(0.91) 

(0.71) 

(0.73) 

(0.76) 

G2 

0.57 

0.73 

0.86 

0.39 

0.49 

0.56 

0.34 

0.39 

0.41 


(0.89) 

(1.10) 

(1.28) 

(0.64) 

(0.75) 

(0.80) 

(0.57) 

(0.58) 

(0.59) 

Ga 

0.63 

0.79 

0.88 

0.40 

0.52 

0.60 

0.37 

0.39 

0.43 


(0.95) 

(1.11) 

(1.35) 

(0.61) 

(0.72) 

(0.80) 

(0.54) 

(0.55) 

(0.58) 

G4 

0.66 

0.81 

0.99 

0.41 

0.59 

0.69 

0.37 

0.42 

0.45 


(0.97) 

(1.07) 

(1.33) 

(0.61) 

(0.69) 

(0.76) 

(0.49) 

(0.55) 

(0.59) 

Gs 

0.68 

0.80 

1.00 

0.43 

0.58 

0.69 

0.38 

0.42 

0.47 


(0.92) 

(1.06) 

(1.40) 

(0.59) 

(0.74) 

(0.83) 

(0.50) 

(0.55) 

(0.59) 


univariate models separately to each item for simplicity. In what follows, yi and 
Xi denote the direct estimate (scaled by 1000) from SFIE and the covariate (re¬ 
liable estimate) from NSFIE, respectively, on the item ‘Health’ or ‘Education’. 
For each survey data, we applied the PTFH model: 

hxivi) = l3o+ Pi logxi + Vi+ei, i = 1,... ,47, 

where Vi ~ N{0,A), Si ~ iV(0,1?i) and A and A are model parame¬ 

ters. For comparison, we also applied the log-FH model corresponding A = 0 
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in the above model, and the classical Fay-Herriot model. The model param¬ 
eters were estimated by the maximum likelihood method in all models. For 
computing Di in each model, we used the past data for consecutive eight years 
from 2006 to 2013, which are denoted by za for t = 1,..., 8 . In the FH and 
log-FH models, we simply calculated the sampling variance of {^ii,..., Zis} and 
{logZii,..., logZis}, respectively. In the PTFH model, similarly to Section IXTl 
we hrst maximize (12.41) with Di = Di{X) and let Di = Di{\). 

In the PTFH model, we have A = 0.59 in “Education” and A = 0.86 in 
“Health”. Moreover, based on based on 1000 parametric bootstrap samples, 
we obtained 95% confidence intervals of A, (0.20,1.16) in “Education” and 
(0.18,1.99) in “Health”, which indicate the log-transformation might not be 
appropriate. In Figure [U we present the estimated regression lines of the three 
models, noting that y = h^^{l3o + Pix) in PTFH, and y = exp(,5o + Pix) in 
log-FH. From the figure, it is observed that all the regression lines are similar. 
For assessing the suitability of normality assumptions of error terms, we com¬ 
puted the standardized residuals: Ci = Vi/\/A + Di, where is the estimates 
of Vi -I- £i, so that Ti = h^{yi) - Po - PiXt in PTFH, n = logy^ - Po - PiXi in 
log-FH and Xi = yi — Pq — fiiXi in FH, noting that Cj asymptotically follows the 
standard normal distribution if the model specification is correct. In Figure 
we give the estimated density of in each model, which does not strongly sup¬ 
ports the normality assumption of three models, but all the estimated densities 
are close to symmetric. Hence, the normality assumption might be plausible. 
In fact, we calculated the p-value of the Kolmogorov-Smirnov test for normality 
of Ci, presented in Table [5l and found that the normality assumption was not 
rejected in the three models in both items. Moreover, in Table [Sj we provide 
AICs based on the maximum marginal likelihood for the three models. It can 
be seen that AICs of PTFH and log-FH are similar and smaller that that of FH 
in “Education” while that of PTFH is the smallest in “Health”. 

For investigation of goodness-of-fit of the PTFH model, we set Zi = h^{yi) 
and Wi = log Xi , and applied the penalized spline model used in Opsomer et al. 
(2008): 

p K 

Zi = Po + '^(3eWiKiY_^+Vi+£i, i = (4.1) 

i=i i=i 

where Vi ^ (0, A), Si ^ N{0,Di) and ( 71 ,... , 7 if)* ^ N{0,alK), (a:)+ denotes 
the function x^I^x > 0), and ki < ■■■ < kk is a set of fixed knots which 
determine the flexibility of splines. We set K = 20 and took ki and kk as 10% 
and 90% quantiles of wt, respectively, and set K 2 , ■ ■ ■, kk-i at regular interval. 
For the degree of splines, we considered three cases: p = 1,2,3. We estimated 
model parameters /3o,..., /3p, A and a by the maximum likelihood method. In 
Figure [31 we present the estimated regression lines of three penalized spline 
models (p = 1,2,3) as well as that of PTFH, which shows that the linear 
parametric structure in the PTFH model seems plausible and PTFH would fit 
well in both items. 


14 



Education 


Health 




Figure 1: The scatter plots of {xi, yi) with estimated regression lines in the parametric trans¬ 
formed Fay-Herriot (PTFH) model, the log-transformed Fay-Herriot (log-FH) model and the 
classical Fay-Herriot (FH) model. 


Finally, we computed the MSE estimates of the small area estimators for 
the three models. In the PTFH model, we used the estimator given in Theorem 
12.11 with 1000 bootstrap samples and 5000 Monte Carlo samples of Vi and Si 
for numerical evaluation of gu. For the MSE estimates in the log-FH and FH 
models, we used the estimator given in Slud and Maiti (2006) and Datta and 
Lahiri (2000), respectively. We report the small area estimates and MSE esti¬ 
mates in seven prefectures around Tokyo in Tabled It can be seen that log-FH 
and FH produce relatively similar estimates of area means while the estimates 
from PTFH are not similar to those models. Regarding MSE estimates, we can 
observe that the values in PTFH are smaller than the other two models, but we 
cannot directly compare these results since each MSE estimates are calculated 
based on the different sampling variances Di. 


Table 5: AIC and p-value of Kolmogorov-Smirnov (KS) test for normality of standardized 
residuals. 


AIC p-value of KS test 


Data 

PTFH 

log-FH 

FH 

PTFH 

log-FH 

FH 

Education 

Health 

313.1 

172.9 

312.9 

180.7 

314.5 

183.4 

0.577 

0.519 

0.469 

0.440 

0.848 

0.375 
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Health 




Residual 


Residual 


Figure 2: The estimated density of standardized residuals in the parametric transformed Fay- 
Herriot (PTFH) model, the log-transformed Fay-Herriot (log-FH) model and the classical 
Fay-Herriot (FH) model. 


Education 



Health 



log(x) 


log(x) 


Figure 3: The scatter plots of (loga^j, h-^{yi)) with estimated regression lines in the parametric 
transformed Fay-Herriot (PTFH) model and the nonparametric (NP) model based on the 
penalized spline with three orders (p = 1, 2, 3). 
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Table 6: The small area estimates and the root of MSE (RMSE) estimates in seven prefectures 
around Tokyo from four models, the parametric transformed Fay-Herriot (PTFH) model, the 
log-transformed Fay-Herriot (log-FH) model and the classical Fay-Herriot (FH) model. 


Data 

Prefecture 

DE 

Estimates 

PTFH log-FH FH 

PTFH 

RMSE 

log-FH 

FH 


Ibaraki 

21.97 

21.80 

21.54 

21.44 

1.12 

1.99 

1.95 


Tochigi 

21.88 

21.63 

21.21 

21.30 

1.65 

2.41 

2.10 


Gunma 

14.12 

14.76 

15.49 

15.17 

2.74 

3.68 

2.93 

Education 

Saitama 

32.61 

27.41 

23.81 

22.78 

4.49 

4.68 

4.83 


Chiba 

21.55 

20.92 

20.19 

20.08 

3.31 

3.90 

3.87 


Tokyo 

22.04 

21.84 

21.55 

21.06 

1.73 

2.16 

3.12 


Kanagawa 

22.32 

21.87 

21.32 

20.86 

2.37 

2.93 

3.34 


Ibaraki 

10.35 

10.37 

10.67 

10.70 

0.25 

0.83 

0.85 


Tochigi 

11.76 

11.71 

11.34 

11.33 

0.61 

1.08 

1.08 


Gunma 

8.74 

8.88 

10.00 

10.12 

0.51 

1.23 

1.25 

Health 

Saitama 

11.13 

11.13 

11.21 

11.22 

0.19 

0.77 

0.79 


Chiba 

12.81 

12.64 

11.64 

11.64 

0.60 

1.06 

1.07 


Tokyo 

13.80 

13.73 

12.66 

12.53 

0.18 

0.79 

0.85 


Kanagawa 

14.50 

14.45 

13.55 

13.61 

0.10 

0.63 

0.62 


5. Concluding Remarks 

We proposed the use of the dual power transformation in the classical Fay- 
Herriot model for estimating positive valued small area means, which we call 
parametric transformed Fay-Herriot (PTFH) model We derived the best pre¬ 
dictor (BP) and obtained empirical best predictor by replacing unknown model 
parameters in BP with their maximum likelihood estimator. For risk evaluation 
of the predictor, we derived the second unbiased estimator of mean squared er¬ 
rors of the predictor based on the parametric bootstrap. In the simulation study, 
we have numerically evaluated the prediction errors of the proposed method to¬ 
gether with some existing methods. In empirical study, we applied the method 
to the Japanese survey data. 

For checking goodness-of-fit of PTFH, we suggested fitting the penalized 
spline model for the transformed data, as presented in the application. While 
we did not consider estimating the penalized spline and the transformation pa¬ 
rameter simultaneously, estimation of PTFH model with penalized spline could 
be useful. However, the detailed discussion seems exceed the scope of this paper, 
and is left to a valuable future study. Concerning the calculation of the sampling 
variance Di , we only considered the case where the some auxiliary observations 
are available as considered in Section [3] and SI However, due to highly variable 
survey weights and small samples, the sampling variance Di could be badly 
estimated in areas with small samples, and nothing can reassure practitioners 
about what effects might have on the results from the PTFH model. 
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Regarding model companion in Section SI we used the AIC based on the 
maximum marginal likelihood (marginal AIC) for model comparison. However, 
the use of the marginal AIC in the Fay-Herriot model is known to have some 
drawbacks, and Han (2013) and Marhuenda et al. (2014) proposed alternative 
information criteria. However, the use of these criteria in the transformation 
model would not be straightforward due to the variability of estimating the 
additional transformation parameter, thereby the detailed discussion is left to 
a future study. 

Acknowledgement 

We would like to thank the Associate Editor and the four reviewers for 
many valuable comments and helpful suggestions which led to an improved 
version of this paper. We would also like to thank professor J. N. K. Rao 
for helpful comments and suggestions and also thank attendees at the seminar 
in Statistics Canada. This work was supported by JSPS KAKENHI Grant 
Numbers, JP16H07406, JP15H01943 and JP26330036. 


Appendix 


Al. Derivation of (j2.5|) . Note that E[(/ri — ^i)^] = E[/rf] — E[/r?]. From 
(lOl) . it follows that 


= 


6i{u),al)4>{u] x\(5, A+Di)dsdtdu, 


where 9i{u) = aiU+ (1 — ai)x\^ with = A/{A-\- Di). Let S and T be random 
variables mutually independently distributed as N{6i{U),af) under given U = 
u, and let 17 be a random variable distributed as N{xl(3, A + Di). The marginal 
distribution of the vector (5, T)' is 



Then, we have E[^?] = E[/iO^(S')/iO^(T)], where the expectation is taken with 
respect to the marginal distribution of (5, T)'. Introducing random variables 
zi and Z 2 mutually independently distributed as A(0,A), we can express S = 
xlf3 + ciiZi + C 2 iZ 2 and T = xlfB + cuZi — C 2 iZ 2 , thereby we obtain the expression 

E[)i-] = E[h^^{xl(3 + CuZi + C 2 rZ 2 )h'l^ {x\(3 + C^Zx - C 2 iZ 2 )]- 
Since E[/rf] can be expressed as E[^f] = {xl(3 + ^i)}^], we obtain (12.5p . 


A2. Proof of Lemma 12.21 For notational simplicity, we define = 
djli/dcj) and = d^]li/dcf>d(f/. Expanding /li around /li, we get 

%-'iii= - 0) + i($ - ^)‘/f*(00)(?/*; 0*)($ - 0), 
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where </>* is on the line connecting ip and <p. Then, it holds that 


g2i{(p) = E ($ - (pf ($ - P) 


Rl + ^^2) 


where i?i = - p){p - PY^i^(^^){y^;p*)(p - p)] and R 2 = E[{(p - 

(yd P*){P~ ‘Z’)}]- We first show that i?i = o{m~^) and i ?2 = o{m~^). 
We only prove i?i = o{m~^) since the evaluation of R 2 is quite similar. In what 
follows, we define /dpkdpi = p*)/dpkdpe. It follows that 


P+2 p+2 p+2 

j=i k=i 

p+2 p+2 p+2 

j=i k=i e=i 


Y 9/iY 

f d^Jl* \ 

\dpj) 

\dpkdpij 


(pj PjYPk Pk)iYPi 


and 

\Ui]ki \ < E 

< E 


?h\ 

dpj) \dpkdpi) 


E 


iPj PjYP^ pk)YP^ 


dyLi 


dpj 


E 




dpkdpi 


n E 


Ya 


from Holder’s inequality. Erom the asymptotic normality of p given in Lemma 
2.11 it follows E[|(/) — pY] = for arbitrary r > 0. Then, we have 


n E 

ae{j,k,e} 


i^a Ya 


1/4 


= o{m ^). 


Noting that 


dh^ (x) d / r - 

^^s-(Ax+yiT^) 

hY\x) 


l/A 


A 


the straightforward calculation shows that 

'dhY\t) 


Mz(A) 


/ 

J —( 


= 1 ® 


dX 

0^hY\eY 


•\/l + A^cc^ 

1 

p{t-0^,af)dt + 


- i log(Aa; + \/l + A2x2)|, 
dp{t\9i,af) 


hY'it) 


dX 


+ 


x/r+A^ 

1 f dhxivi) 


Vi 




(!j'((),)log(A9i + yiTv^) 


dt 


Vi 


D, 


dX 


E 


{O^-0^)hY\OY 


= E[fi{ 0 i)\yi] +E[/2(6»,)|y,] + E[/3(6»i, ?/i)|y,]. 
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where 


Note that 


dhxivi) logo; 


dX 


^ - hoga: + - ) hx{x). 


E 


{nf(.o.,y^)\y^]y] < E[E[/(0„yO“l2/^]] =E[/(0.,2/.)“] 


for a > 0 from Jensen’s inequality. Since E[/i(0i)“] < oo, E[/ 2 ( 0 i)“] < oo, 
E[/ 3 ( 0 i, 2 /i)“] < oo for a > 0 , it follows that 


Similarly, we have 
y^{P) 


Dj^Xi 


E 


E 


dyti 

8 " 

dX 



< 00 . 


[A + Di)aj 
~ _ Dl 

2at{A + D,Y 

which leads to 

E 




E 




dyi 


d<l)k 


< oo, 


for k = l,...,p+l. Moreover, straightforward but considerable calculations 
shows that E /d(j)kd4>ef] < oo. Hence, we have i?i = o{m~^). A quite 

similar evaluation shows that R 2 = o(m~^), which leads to 

g 2 i( 0 )=E +o(m~^). 

Finally, using the similar argument given in the proof of Theorem 3 in Kubokawa 
et al. (2016), we have 


E 


($ - ($ - 0) = tr (e 

1 

m 




E[($-0)($-</>)*])+o(m 


—tr(y(</.)E 
m I 




I +o(m ^), 


which completes the proof. 


A3. Proof of Theorem 12.11 Taylor series expansion of gii{4>) around 4> 
gives 


Ebi*($)] = 9ii{(t>) + - 0] + itr 

o<p ^ 


^^E[( 0 -</,)($-</,)*] )+i? 3 , 


p+2 p+2 p+2 ^2 , - 

E dgiiw) 


Q’j 4‘j){4‘k 4’i)- 
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where 


































Since E[((/)j — — 4>k)(4>t. — </>^)] = o{m ^), it holds that E[ii 3 ] = o(m ^). 

Moreover, from Lemma l2.11 we have 


E[ffii($) - gu{(t>)\ 


1 dgii{(j)) 
m dff)* 





+ o(m ^), 


thereby, we have E[gii(cj}) — gii{4>)] = m~^C 2 { 4 >) + o(m“^) with the smooth 
function C2{4>). Hence, from Lemma [2^ and Butar and Lahiri (2003), we obtain 
the second order unbiasedness of (lO) 
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